NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale

نویسندگان

Daniel Kang

John Emmons

Firas Abuzaid

Peter Bailis

Matei Zaharia

چکیده

Video is one of the fastest-growing sources of data and is rich with interesting semantic information. Furthermore, recent advances in computer vision, in the form of deep convolutional neural networks (CNNs), have made it possible to query this semantic information with near-human accuracy (in the form of image tagging). However, performing inference with state-of-the-art CNNs is computationally expensive: analyzing videos in real time (at 30 frames/sec) requires a $1200 GPU per video stream, posing a serious computational barrier to CNN adoption in large-scale video data management systems. In response, we present NOSCOPE, a system that uses cost-based optimization to assemble a specialized video processing pipeline for each input video stream, greatly accelerating subsequent CNNbased queries on the video. As NOSCOPE observes a video, it trains two types of pipeline components (which we call filters) to exploit the locality in the video stream: difference detectors that exploit temporal locality between frames, and specialized models that are tailored to a specific scene and query (i.e., exploit environmental and query-specific locality). We show that the optimal set of filters and their parameters depends significantly on the video stream and query in question, so NOSCOPE introduces an efficient cost-based optimizer for this problem to select them. With this approach, our NOSCOPE prototype achieves up to 120-3,200× speed-ups (3188,500× real-time) on binary classification tasks over real-world webcam and surveillance video while maintaining accuracy within 1-5% of a state-of-the-art CNN.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NoScope: Optimizing Neural Network Queries over Video at Scale

Recent advances in computer vision—in the form of deep neural networks—have made it possible to query increasing volumes of video data with high accuracy. However, neural network inference is computationally expensive at scale: applying a state-of-the-art object detector in real time (i.e., 30+ frames per second) to a single video requires a $4000 GPU. In response, we present NOSCOPE, a system ...

متن کامل

A Multi-scale Multiple Instance Video Description Network

Generating natural language descriptions for in-thewild videos is a challenging task. Most state-of-the-art methods for solving this problem borrow existing deep convolutional neural network (CNN) architectures (Alexnet, Googlenet) to extract a visual representation of the input video. However, these deep CNN architectures are designed for single-label centered-positioned object classification....

متن کامل

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

Deep Learning Architectures for Face Recognition in Video Surveillance

Face recognition (FR) systems for video surveillance (VS) applications attempt to accurately detect the presence of target individuals over a distributed network of cameras. In video-based FR systems, facial models of target individuals are designed a priori during enrollment using a limited number of reference still images or video data. These facial models are not typically representative of ...

متن کامل

Reducing Complexity of HEVC: A Deep Learning Approach

High Efficiency Video Coding (HEVC) significantly reduces bit-rates over the preceding H.264 standard but at the expense of extremely high encoding complexity. In HEVC, the quad-tree partition of coding unit (CU) consumes a large proportion of the HEVC encoding complexity, due to the bruteforce search for rate-distortion optimization (RDO). Therefore, this paper proposes a deep learning approac...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

PVLDB

دوره 10 شماره

صفحات -

تاریخ انتشار 2017

NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale

نویسندگان

چکیده

منابع مشابه

NoScope: Optimizing Neural Network Queries over Video at Scale

A Multi-scale Multiple Instance Video Description Network

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Deep Learning Architectures for Face Recognition in Video Surveillance

Reducing Complexity of HEVC: A Deep Learning Approach

عنوان ژورنال:

اشتراک گذاری